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Landauer's erasure principle exposes an intrinsic relation between thermodynamics and informa- 
tion theory: the erasure of information stored in a system, S, requires an amount of work propor- 
tional to the entropy of that system. This entropy, H(S\0), depends on the information that a given 
observer, O, has about S, and the work necessary to erase a system may therefore vary for different 
observers. Here, we consider a general setting where the information held by the observer may be 
quantum-mechanical, and show that an amount of work proportional to H(S\0) is still sufficient to 
erase S. Since the entropy H(S\0) can now become negative, erasing a system can result in a net 
gain of work (and a corresponding cooling of the environment). 



I. PRELIMINARIES 

Statistical mechanics and information theory have a 
long standing and intricate relation. A famous exam- 
ple of this connection is Landauer's erasure principle pQ, 
used to exorcise Maxwell's demon [2|. According to this 
principle, in order to perform irreversible operations on a 
system, like the erasure of a bit of information, we need 
to perform work on the system, which is dissipated as 
heat to the environment. The necessary amount of work 
is determined by our uncertainty about the system — the 
more we know about the system, the less it costs to 'erase' 
it. This result suggests that the seemingly elusive con- 
cept of 'information' is directly linked to a very concrete 
quantity, 'work'. Here, we analyse the relation between 
thermodynamics and information in a world that is fun- 
damentally quantum mechanical. 

Quantum information theory has peculiar properties 
that cannot be found in its classical counterpart. One 
example is that one's uncertainty about a system, as mea- 
sured by an entropy, can become negative 3] . This moti- 
vates the following question: when our uncertainty about 
a system is negative, can we gain work by erasing the in- 
formation stored in that system? Our results show that 
this is indeed possible; inherently non-classical aspects of 
quantum information theory, like negative uncertainty, 
are at a fundamental level part of thermodynamics. 



A. Physics from an information-theoretic 
viewpoint 

Our knowledge about the state of physical systems is 
usually limited, because the number of parameters that 
we can measure and store, as well as our precision, are 
finite. A typical example is a gas: we cannot keep track of 
the state of each particle, but only of a few macroscopic 
parameters, such as the volume or pressure of the gas. 
Despite this restricted information, it is possible to make 
accurate predictions about the behavior of systems using 
tools of statistical mechanics [3H2] . 

Information constraints can also result in different ob- 



servers having considerably different knowledge about 
the same physical reality. To illustrate this subjectiv- 
ity of information, consider an n-qubit system, S (e.g., n 
spin-1/2 particles). An observer, Alice, prepares the sys- 
tem in a known pure state. A second observer, Bob, does 
not know which state that is, but applies an energy mea- 
surement to the system. If S is degenerate, Bob remains 
ignorant about the exact state of the system. 

A natural way to quantify the knowledge of these ob- 
servers is to use entropy measures. The entropy of a 
system, S, given all the information available to a given 
observer, O, denoted by H(S\0), increases with the un- 
certainty of the observer about the exact state of the 
system. 1 In the case where S is fully degenerate, the 
entropy of the system from the point of view of Alice is 
zero, H(S\A) = 0, as she has complete knowledge of the 
state of the system. On the other hand, Bob has maxi- 
mal entropy, H(S\B) — n, because he does not know in 
which of the 2™ possible states the system is. 2 

This observer-dependence of entropy seems to contra- 
dict the traditional thermodynamics view, where entropy 
appears as a property of the system rather than of the 
observer. However, the two views can be reconciled by 
introducing a standard observer who has access to a well- 
defined set of macroscopic parameters, but whose uncer- 
tainty about the state of the system is otherwise max- 
imal [3]. The idea is that the knowledge of this stan- 
dard observer corresponds, to good approximation, to the 



For concreteness, one may think of the von Neumann entropy, 
which for a system, S, in state p, is defined by H(S) P : = 
— Tr(plog 2 p). However, most of this section is valid for any rea- 
sonable entropy measure, and our technical statements will use 
smooth min- and max-entropies [J]. These are generalizations of 
the von Neumann entropy, and reduce to the latter for certain 
'nicely behaved' distributions, e.g., in the thermodynamic limit 
(see Appendix |B| for details). The subscript in H(S) P can be 
dropped if the state is clear from the context. 

The entropy of S conditioned on the classical memory O, 
H(S\0), can be defined as the expectation, taken over all states 
of the memory, mo , of the entropy of p m , the state of S condi- 
tioned on m Q : H(S\0) := E m [H(S) p ™]. 
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knowledge we typically have about large systems in real- 
istic situations: in general, we do not know microscopic 
details such as the spin direction of individual particles, 
but only parameters like the energy of a system (in the 
above example, it would make sense to take Bob as the 
standard observer). One may nevertheless ask whether 
the difference between the entropies H(S\A) and H(S\B) 
has any physical significance. As we shall see, this is in- 
deed the case. 



B. Quantum knowledge 

The observers we described require an internal mem- 
ory to store the information they have about the system 
S (for Alice this memory needs to be large enough to in- 
clude a full description of the state of S, while Bob only 
stores the value of the energy). It often is implicitly as- 
sumed that this memory is classical. We go beyond this 
classical scenario and consider observers who may have 
access to information about S that is itself represented 
as the state of a quantum system — a quantum memory. 

To illustrate the effects of a quantum memory, let us 
consider a third observer, Quasimodo. Quasimodo pre- 
pares each of the n particles of S such that it is maxi- 
mally entangled with a corresponding qubit of his quan- 
tum memory, Q. Note that this quantum memory is at 
least as useful as the classical data held by Alice. In fact, 
the latter may be recovered by applying a measurement 
on Quasimodo's memory. 

In order to quantify the uncertainty that Quasimodo 
has about S, we need entropy measures that account 
for the quantum-mechanical nature of the information 
he holds. In the field of quantum information, such 
measures are known as conditional entropies and gen- 
eralize classical conditional entropies. The conditional 
von Neumann entropy can be written as a difference, 
H(S\Q) = H(SQ) - H(Q). 3 Here, H{SQ) denotes the 
von Neumann entropy of the joint state of the system, 
S, and the quantum memory, Q. Since this joint state is 
pure, its entropy is zero. On the other hand, the reduced 
state of the memory, pq, is fully mixed, which corre- 
sponds to the maximal entropy H(Q) = n. We there- 
fore find that, for Quasimodo, the conditional entropy is 
negative, H(S\Q) = —n. Such negative entropies cannot 
occur for purely classical observers like Alice and Bob. 

This raises the question of whether these 'negative un- 
certainties' have any operational meaning. The answer 
is yes. They can be used to quantify, for instance, the 
amount of entanglement needed to send a state to a re- 
ceiver with side information, a task commonly referred 
to as 'state merging' [3J. Another example where neg- 
ative conditional entropies play a crucial role was given 



If Q was classical, this expression would be equivalent to 
H(S\Q) := E m [H(S) pm ], as before. 



recently in the context of Heisenberg's uncertainty princi- 
ple. The principle bounds the minimum uncertainty one 
has about the outcome of a measurement on a system, 
S, chosen from two complementary observables, e.g., a 
spin measured in the X or Z basis. 4 This bound is, how- 
ever, violated if quantum information about the initial 
state of S is available. It was shown that this violation 
can be quantified by the negativity of the entropy of S 
conditioned on the memory |10j. 5 

In this work, we go one step further and establish a 
relation between a physical quantity (namely the work 
necessary to 'erase' the state of a system) and the condi- 
tional entropy. Remarkably, the validity of this relation 
extends to the quantum regime and, in particular, yields 
a direct thermodynamical interpretation of negative con- 
ditional entropies. 



C. Information- work relation 

In this section we illustrate Landauer's erasure princi- 
ple and express it in terms of conditional entropies. The 
process of erasing a system is defined as taking the sys- 
tem to a pre-defined pure state, |0). Note that while 
erasing a system leads to the loss of information that 
could be encoded there, it may also reduce our uncer- 
tainty about the system (if we did not know the previous 
state of the system, now we are sure that it is |0)). 

For a concrete example of how to erase a bit, consider 
a spin- 1/2 particle exposed to a tunable magnetic field 
that can be adjusted to manipulate the energy of states 
and |t), according to a Hamiltonian like T~Lb = jB ■ s. 
Initially, the magnetic field is turned off, so the system 
is degenerate. We define 'erasing' as taking the spin to 
the pure state |0) := Let us see how two different 
observers could do this. 

Our first observer, Alice, knows that the particle is in 
a pure state, for instance In order to take the particle 
to \l) , she may apply a unitary operation, in this case a 
NOT gate. This operation is reversible and has no energy 
cost. 

The second observer, Bob, has no information about 
the initial state of the system, describing it as a fully 
mixed state, |. One strategy he can follow to erase the 
bit is to couple the particle to a heat bath and slowly 
increase the magnetic field, raising the energy of state |f) 
until its occupation decays, as shown in Fig. [I] This era- 
sure process has an energy cost of kThi2, where T is the 
temperature of the bath and k the Boltzmann constant. 



4 More precisely, in its formulation proposed by Deutsch [8] and 
Maassen and Uffink [9], the principle asserts that H(X\0) + 
H(Z\0) > log 2 1 , where O is any classical description of the 
initial state of S, and where log 2 - > is a measure for the 
non-commutativity of the observables X and Z. 

5 In the generalized form where O may be non-classical, the rela- 
tion reads H(X\0) + H(Z\0) > log 2 \ + H(S\0). 
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FIG. 1: Erasing a fully mixed qubit. a) We start from a fully mixed state in a degenerate system. The filling of each circle 
represents the probability, (nj_/^), that the system is in the respective state. 6) We couple the system to a heat bath at 
temperature T and slowly raise the energy of state Thermalized by the bath, the system equilibrates in a Gibbs state of 
temperature T. As the energy of t) increases, it becomes less occupied, according to (n^)(E) = [1 + e B//fcT ] _1 . We continue 
raising that level until it is empty. The total cost of this operation is J Q °° {n^)(E) dE = fcTln2. c) Finally, we isolate the 
system and lower the energy of state Since the state is empty, this operation is energy neutral. 



More generally, in a hybrid setting where the sys- 
tem, S, may be quantum mechanical but the information 
about it is classical, the work, W(S), required to erase S 
is given by 

W(S) = H(S) fcTln2 . (1) 

Crucially, Eq.[T]relates work to a quantity that is, accord- 
ing to our discussion above, dependent on an observer. 
This apparent contradiction is resolved by reconsidering 
the meaning of W(S). Note that in order to erase a sys- 
tem, we need to design an experimental setup that can, 
and in general must, depend on the knowledge we have 
about it. Hence, rather than describing W(S) simply as 
the 'amount of work one needs to perform to erase system 
S", one may interpret it as the 'amount of work that an 
observer with memory O needs to erase S\ and denote 
it by W{S\0). For an observer with a classical memory, 
Oc> 6 we have in general 

W(S\O c ) = H(S\O c ) kT\n2 . (2) 

We emphasize that this formula does not contradict 
Eq.[l] Instead, it makes it explicit that the relevant quan- 
tities may depend on the knowledge of the observer and, 
in particular, may differ for different observers (in our ex- 
ample, Alice had zero entropy and consequently erased 



6 In the literature on Landauer's erasure principle the system to 
be erased is sometimes referred to as a 'memory'. However, for 
the sake of clarity we reserve the term 'memory' exclusively for 
the observer's memory resources. 



the bit at zero cost, while Bob had H{S\B) — 1 and had 
to perform work fcTln2; see also [H] for a discussion). 

Our contribution is to generalize this relation to the 
fully quantum case. We will be able to analyse what ob- 
servers with quantum memories can do to erase a system, 
and how much that costs them. 



II. THE GENERAL RELATION BETWEEN 
INFORMATION AND WORK 

In this section we state and explain our main result, 
a general relation between the work necessary to erase a 
system and the information one has about this system. 

Several approaches have been proposed in the past to 
formalize the idea of a thermal process and to study era- 
sure, work extraction and their relation to Maxwell's de- 
mon [TJ [T2Tf2T| . This has spurred a rather extensive liter- 
ature (for overviews see [22H25] ) as well as debates (see, 
e -g-i [HHSSHIE])- Correlations and entanglement can af- 
fect erasure and work extraction, as has been noted by 
several authors. For instance, in [25] the system to be 
erased is bipartite and the observer is restricted to lo- 
cal operations and classical communication (LOCC); the 
difference between quantum and classical 'demons' is ad- 
dressed in [30] ; see also [3T] for a discussion on 'local' and 
'global' demons in the context of the thermodynamic ar- 
row of time. 

Here, we consider a setting as depicted in Fig. [2] where 
an observer, who has a quantum memory, O, tries to 
erase a system, S, using a heat bath at temperature T 
and performing operations on 5* and (which are not 
restricted to LOCC). We assume that the initial Hamil- 




FIG. 2: Our setting: an observer, here represented by a machine with a quantum memory (O), will erase a system, S, using a 
heat bath at temperature T. The observer can store and withdraw energy from a battery. The rest of the universe is represented 
by the reference system. 



tonian of S and O is fully degenerate. Details on the 
setting can be found in Appendix [A} 

Since the memory O is quantum mechanical, access- 
ing it may in general change its content. Also, there is 
no reason why the memory would only contain informa- 
tion about S; it could also carry information about other 
systems. Here we take a cautious position and require 
that those memory contents are kept intact in the era- 
sure process. Note that this requirement is crucial, since 
the contents may generally be needed for other purposes, 
e.g., if the erasure of S is part of a larger procedure. As 
a simple example, suppose we erase system S, and later 
possibly would like to erase another system Z. If the 
erasure of S removed the information about Z, the sub- 
sequent erasure of Z could become unnecessarily costly. 

In order to specify this memory preservation condition 
on a formal level, it is convenient to introduce a 'reference 
system' R, which models all systems other than S that 
the memory can have information about. To guarantee 
that the information about R is unaltered, we assume 
that the joint state of the memory and the reference, 
PoRi is preserved by the erasure process and that system 
R is not touched. 



A. A special case 

The general idea of what an observer with a quan- 
tum memory can do to erase a system and gain work 
in the process can be illustrated with a simple exam- 
ple. Consider a single qubit system S, and an observer, 
Quasimodo, who has a memory formed by two qubits, 
Q = Qi <£> Qi- The first qubit is maximally entangled 
with S, in state \QiS), while the second is maximally 
entangled with a qubit of the reference system, R, in 
state IQ2-R) - Quasimodo will try to erase S but keep 
his memory about R intact, preserving the joint state 

Pqr = -f^ ® \Q2R)(Q2R\- Note that the reduced state 
of Qi is fully mixed, because \QiS) is maximally entan- 
gled. 

In a first step, Quasimodo uses the two-qubit pure state 



\QiS) and a heat bath at temperature T to extract work 
2fcTln2, as described in Fig. [3j The system formed by 
Qi and S is left in a fully mixed state. In particular, the 
reduced state of Q\ is fully mixed, which implies that 
the joint state of the memory and the reference is still 
Pqr- Quasimodo then erases the fully mixed qubit S, like 
Bob did in Section [I C[ performing work fcTln2. The net 
work gain of the whole procedure is fcTln2. Note that 
if Quasimodo had not preserved his memory and later 
wanted to erase R, he would have to perform unnecessary 
work. 

This case illustrates how the relation between entropy 
and the work necessary to erase a system applies in a 
quantum scenario: Quasimodo had negative conditional 
entropy about S 1 , H(S\Q) = —1, which resulted in nega- 
tive work cost for erasure, W(S\Q) = — fcTln2. 

Naturally, the energy 'gained' in this process comes 
from the heat bath. As Quasimodo not only extracted 
work but also took S to a pure state, while leaving pqr 
intact, one may at first sight fear that he has violated 
the second law of thermodynamics. This is, however, not 
the case, since those gains are balanced by the reduction 
in correlations between S and Q. In fact, the entropy 
of the global state, H(QSR), increased, and erasing S 
made Quasimodo lose all the entanglement between his 
memory and S. His knowledge about the final state of S 
is only classical — it can be expressed by a non-negative 
conditional entropy, H(S\Q) = 0. This prevents him 
from gaining more work if he erases S again, using this 
process in a perpetual motion scheme. The same obser- 
vation also explains why a negative cost of erasure would 
not enable Maxwell's demon to violate the second law. 



B. Single-shot erasure 

In general, the work required to erase a system is a 
random variable, i.e., the cost of erasure may fluctuate 
each time it is performed. Here we characterize a single 
instance of erasure with a probabilistic statement, and 



in Section II C we will consider the average work cost of 
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FIG. 3: Extracting work from a ^-qubit system in a pure state. This process can be seen as the reverse of erasure (Fig. [TJ. 
a) Only one state is occupied, at energy Eq; the energy of the empty levels is raised to a very high value at zero cost, b) Wo 
couple the system to the bath and slowly decrease the energy of the empty states. These will become gradually populated 
according to the Gibbs distribution. Lowering the partially occupied states results in energy gain of I fcTln2 in total. This 
energy is stored in the battery, c) In the end of the procedure, the system is degenerate and fully mixed. 



erasure in a thermodynamic limit. 

Theorem [I] guarantees that the cost of erasing a system 
does not exceed a bound given in terms of the entropy of 
S conditioned on O, except with a small probability. 

Theorem 1. There exists a process to erase a system S , 
conditioned on a memory, O, and acting at temperature 
T , whose work cost satisfies 

W{S\0) < [H^(S\0) + A] k Tin 2, (3) 

except with probability less than S = V2-% +I2e, 
V5,e > 0. 

The quantity H^ ax (S\0) denotes the e-smooth max- 
entropy of system S conditioned on the quantum mem- 
ory O, a single-shot generalization of the von Neumann 
entropy [7]. In particular, as we shall see, this quantity 
reduces to the von Neumann entropy in a thermodynamic 
limit (we refer to Appendix [B] for definition and proper- 
ties of smooth entropies) . 

The term A can be chosen to be small, and in the 
limit of large systems could be neglected. For instance, 
to allow a maximum probability of failure of only S — 
3%, one pays a price of approximately 20 kT In 2 in the 
work consumption of the process (in addition to the one 
dictated by the entropy). 

Theorem [l] implies that an observer with a quantum 
memory entangled with S (i.e., with if^ ax (5|0) < 0) can 
erase the system with negative work cost, actually ex- 
tracting work in the process. Note that this is more gen- 
eral than the example of Section [lI A[ where S was, conve- 
niently, maximally entangled with a part of the memory: 
Theorem [l] implies that observers can make full use of 



the correlations between S and O, even if those are not 
present in the neat form of maximally entangled qubits. 

As a byproduct of the proof of Theorem [I] we find an 
analogous result for work extraction. The goal of this 
process is to extract work from a system, S, under the 
assumption that the memory is kept intact (while the final 
state of S is arbitrary). 

Corollary 1. Given an n-qubit system S and a mem- 
ory O, there exists a work extraction process acting at 
temperature T, such that the extracted work satisfies 

W e (S\0) > [n - H^ x (S\0) - A] k Tln2, 

except with a probability of at most 6 = \/2~% +12e, 
WS,e > 0. 

C. Thermodynamic limit 

We typically expect thermal fluctuations to disappear 
in macroscopic systems. Theoretically this is usually 
handled by taking a thermodynamic limit, where we in 
some sense increase the size of the system such that fluc- 
tuations are averaged away. In order to define a thermo- 
dynamic limit in our scenario, we imagine to perform the 
erasure on a large collection of independent systems. 

We define the work cost rate of an erasure process as 
the average work cost of the process in this limit, 

w(S\0) = lim -W(S® n \0® n ). 

n—too n 

This quantity can be evaluated if we perform the erasure 
of many copies of a system. To understand the impli- 
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FIG. 4: Information compression, as used in the first step of our proof: a subsystem Si is decoupled from T. The size of Si 
decreases with the strength of the correlations between S and F, and therefore increases with correlations between 5* and the 
memory, O (see Appendix [Bp . Since the global state is pure, Si is purified by a system P of equal size that belongs to the 
remaining systems, S and O. The state of Si CS> P is fully entangled. The arrows symbolize correlations between the different 
systems. 



cations of our claim in such a situation, wc use a well- 
known statement from information theory, the Asymp- 
totic Equipartition Property (AEP) [32]. The quan- 
tum version of this result essentially asserts that, for n- 
partite states that consist of many identical copies of the 
same single subsystem state, the smooth max-entropy 
converges towards the von Neumann entropy (see Ap- 
pendix [B]) . 

The work cost rate can now be evaluated using Theo- 
rcm[T]combined with AEP, leading to the following result. 

Corollary 2. There exists a process to erase a system S , 
conditioned on a memory, O, and acting at temperature 
T , with work cost rate 

w{S\0) < H(S\0) fcTln2. 



III. OUTLINE OF THE PROOF 

We prove our result by providing an explicit process 
that satisfies the bound of Theorem [l] We assume (with- 
out loss of generality) that S is an n-qubit system. The 
erasure process consists of three main steps: 

1. We manipulate S in order to compress the correla- 
tions between the memory and S into a pure state 
of a subsystem of S <£> O that has approximately 
n — H max (S\0) qubits. This state is maximally en- 
tangled between two subsystems of S ® O, like in 
the case of Quasimodo, from the example of Sec- 
tion [ITS] 

2. We use that pure state to extract roughly 
[n - H max (S\0)} fcTln2 work (fcTln2 per qubit). 

3. Finally, we erase system S, performing work 
n fcTln2 (again, fcTln2 per qubit). 

We now describe these three steps in more detail, re- 
ferring to technical proofs that can be found in the ap- 
pendices when necessary. 

In the first step, we show, using decoupling results [3J 
133] that, after an appropriate transformation, the first 



1/2 qubits of S are almost (up to a probability deter- 
mined by 6) uncorrelated to the collection, T, of systems 
outside S and O (see Appendix |C 1| for details), with 

£ > n - H* max (S\0) + 21og 2 (<5 2 - 12s). (4) 

These 1/2 qubits form the subsystem S\. As illustrated 
in Fig. [4] the fact that Si is decoupled from T implies that 
there is an (£/2)-qubit subsystem, P, of S® O such that 
the state of Si <Ei P is (5-close to a pure, fully entangled 
state (details in Appendix C2). 

In a second step, the observer extracts work I fcTln2 
from the state of Si ®P using a heat bath at temperature 
T, as described in Fig. [3] and Appendix [D] The system 
Si ® P is left in a fully mixed state. Note that the state 
used was maximally entangled, so the reduced states of 
Si and P were already fully mixed before this step. In 
particular, the part of the memory involved in work ex- 
traction is not changed. The observer did not touch the 
memory before this second step and will not use it again, 
which implies that the reduced state of memory and ref- 
erence, poBi is preserved by the erasure process. It is 
shown in Appendix [D] that the probability of failure of 
work extraction is upper bounded by 5. The work ex- 
traction process of Corollary [l] ends here. 

In the last step of the erasure process, the observer uses 
energy from the battery to erase system S, as described 
in Fig. [l] performing work n fcTln2. The work balance 
of whole process is (£— n) kT\n.2. The logarithmic term 
in Eq. [4] is usually negative, because we choose 5 and e 
to be small, so we can write the work consumption of the 
process as W(S\0) < [H^ ax (S\0) + A]fcTln2. 



IV. CONCLUSIONS 

We have shown that conditional entropies, as mea- 
sures of the uncertainty that an observer has about a 
system, have a direct physical significance in statistical 
mechanics. These results complement previous findings 
that conditional entropies have an operational meaning 
within information theory [3] [10] . More specifically, we 
have introduced an erasure process that uses the quan- 
tum information that an observer has about a system to 
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erase the latter. The work cost of this erasure process 
depends on conditional entropies, and a curious implica- 
tion of our findings is that negative entropies correspond 
to a negative work cost of erasure. We have also seen 
that an observer with a quantum memory can extract 
twice as much work from a system as one with a classical 
memory. 

The strengthened connection between information the- 
ory and statistical mechanics may allow us to interchange 
concepts between the two areas. An example is the proof 
of our results, as an essential part is played by decou- 
pling, which has shown to be a very powerful informa- 
tion theoretic primitive [33J 133] . The following obser- 
vation suggests that we may also transfer ideas in the 
other direction. Intuitively, it appears rather clear that 
observers cannot extract more work by locally process- 
ing data in their memory. Combined with our bounds 
for work extraction, this gives an alternative 'thermody- 
namic' derivation, as well as interpretation, of the data 
processing inequality (also known as strong subadditiv- 
ity) which, in information theory, is a crucial and non- 
trivial result. 

Our work can be related to discord, a quantity orig- 
inally introduced in the context of open systems the- 
ory and decoherence [35l [36] , and also intensively stud- 
ied in quantum information theory [371 I38j . Discord 
quantifies the difference between the uncertainty about 
a system, S, for an observer that possesses a quantum 
memory, Oq, and one that has only a classical mem- 
ory, Oc, obtained by performing a measurement on Oq, 
S(S\0) = H(S\Oq)~H(S\O c )- Similarly to [3D1[3J, our 
results suggest that 6(S\0)kTln 2 can be interpreted as 
the difference between the work cost of an erasure pro- 
cedure that makes full use of the quantum nature of the 
memory and a process that is restricted to the classical 
properties of that memory. In fact, since our relation be- 
tween work and entropy is valid for a single instance of 
an erasure process, one may consider a generalized defini- 
tion of discord based on the smooth max-entropy, which 
retains its operational meaning in the single-shot case. 



A. Applications 

Our result can also have implications on the funda- 
mental limits of computation. Today, one of the major 
challenges to the miniaturization of circuitry for high- 
performance computing lies in the heat generation. With 
the increased compactification, the heat generated per 
square unit of circuitry is rapidly becoming difficult to 
handle. Although our investigation certainly cannot help 
with the practical issues, it might nevertheless be ex- 
tended to a theory that provides the ultimate bounds on 
dissipation. As is well known, computation per se can 
be made reversible [40] [41] . However, this comes at the 
expense of keeping extra information about the compu- 
tation in a memory. Whenever we wish to erase a part of 
this memory, Landauer's erasure principle dictates that 



this unavoidably comes at the cost of generating heat. 

A very common scenario in a computation is that we 
wish to erase a part of a memory, but keep the rest of the 
memory intact. How much work do we need to dissipate 
in order to do this? The naive answer would be that the 
cost is given by the entropy solely of the part of the mem- 
ory to be erased. However, our analysis shows that one 
can do better, namely that the required work is upper- 
bounded by a conditional entropy, which in general can 
be much smaller. 

Note that our result requires almost perfect control of 
the quantum systems involved, and one may wonder why 
we should consider such a theoretical idealization. As an 
analogue one can think of the Carnot cycle. Although the 
ideal performance of the Carnot engine in many cases can 
be a practically unattainable ideal limit, it nevertheless 
provides the theoretical foundation in terms of which the 
performance of heat engines can be gauged. Reversible 
computation together with the erasure principle provides 
a similar ideal limit for minimally heat generating com- 
putation. 
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Appendix A: Formal setting 

In this appendix we formalize the setting and the con- 
ditions for an erasure process that we use to derive The- 
orem [T] 

Setting: our setting consists of a system S, a quantum 
memory, O, a heat bath at temperature T, a battery 
and a reference system, R (Fig. [2]), so that the initial 
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global state is pure, and the Hamiltonian of the composed 
system S <£> O is fully degenerate. 

Allowed actions: the following physical processes on 
any subsystem, X, of S (£> O are allowed: unitary trans- 
formations on X; manipulation of the energy levels of X; 
coupling between X and the heat bath or battery. One 
may not perform any operations on the reference system. 

Erasure process: in the setting described, a successful 
erasure process is one that erases system S and preserves 
the joint state of the memory and the reference, poR- A 
system is said to be erased when it is in a pre-defined 
pure state. The work cost of the process is defined as 
the difference between the initial and final charge of the 
battery. 

Altering the energy of a state from Eq to Eq + AE 
has an average energy cost of (n)AE, where (n) is the 
probability that the system is in that state. This energy 
can be withdrawn from a battery, modelled as follows. 

Battery. A battery is a system characterized by an 
energy value, E, called charge, and the following opera- 
tions: 

• Withdrawing energy (performing work). If per- 
forming an operation on a system requires energy 
AE, coupling between the system and the battery 
is modelled by performing that operation and de- 
creasing the charge of the battery by AE. 

• Storing energy (extracting work). Conversely, if an 
operation on a system has a negative energy cost 
AE, coupling the battery to the system and per- 
forming the operation results in an increment of 
AE of the charge of the battery. 

Heat bath. We assume that the heat bath is large 
enough to thermalize a system like S without altering its 
own temperature. We model contact between a system 
and the heat bath by replacing the state of the system 
with a thermal Gibbs state of temperature T. Physically, 
this corresponds to letting the system be in contact with 
heat bath for long enough to thermalize. This condition 
does not imply that the state of the heat bath does not 
change — it does, losing or gaining the energy required 
to thermalize the system, but not enough to affect the 
temperature of the bath. 

Appendix B: Smooth entropies 

The main result, Theorem[TJ relies on the smooth max- 
entropy, i?,j lax , as a measure to quantify uncertainty [TJ. 
Smooth entropies have, so far, mainly been used in in- 
formation theory, where they proved to be the relevant 
quantities to characterize information-processing tasks 
such as randomness or entanglement distillation, chan- 
nel coding, data compression, or key distribution. 

The formulation of the entropy-work relation in terms 
of the smooth max-entropy — rather than the more stan- 
dard von Neumann entropy — has the advantage that the 



relation is valid independently of the structure of the un- 
derlying quantum states. A work-entropy relation involv- 
ing the von Neumann entropy (Corollary [2]) is obtained 
from this general result by introducing appropriate as- 
sumptions, as explained below. 

In the following, we briefly review the definition of 
smooth entropies and show how they are related to the 
von Neumann entropy. For a more detailed discussion of 
smooth entropies, their properties, and their information- 
theoretic significance, we refer to [7) I43rl45] , 



1. Definition and properties 

Let p = pso be the state of a bipartite system, consist- 
ing of subsystems S and O. The e-smooth max-entropy 
of S conditioned on O can be expressed in terms of the 
fidelity, 7 F, as 

H e max {S\0) p := inf suplog 2 F(p' so , l s <g> a f ■ 
Pso "o 

The supremum ranges over all density operators <7o on 
O. The infimum is taken over all (subnormalized) density 
operators p' so that are e-close 8 to pso, where e > is 
the smoothness parameter, which is usually chosen to be 
small but nonzero. 

The proof of Theorem [I] also involves the smooth min- 
cntropy, which can be seen as the dual of the smooth 
max-entropy, in the following sense. Consider a purifi- 
cation psov of the given bipartite state pso > with a pu- 
rifying system T. The e-smooth min-entropy of S con- 
ditioned on r then corresponds to the negative smooth 
max-entropy conditioned on O [HI HB] , 

H^ D (S\T) p = -H^ ax (S\0) p . (Bl) 

Smooth entropies have properties analogous to those 
of the von Neumann entropy. For example, for e — > 0, 
both H^ in (S\0) p and H^ in (S\0) p are if the reduced 
state on S is pure, 1 for a qubit S that is fully mixed 
and uncorrected to O, and —1 for a qubit S that is 
maximally entangled with O. Furthermore, they satisfy 
a data-processing inequality. It asserts that the entropy 
of S conditioned on O can only increase if information is 
processed locally at O. Formally, 

H L a yi{ S \0')p > H^ aK (S\0) p , 

where p = pso' is the state obtained from pso when a 
completely positive map Ai is applied on system O. 



7 Note that the fidelity can be defined for arbitrary (not neces- 
sarily normalized) positive operators, R and S, by F(R,S) : = 
[V^RvSUl, where || ■ ||i is the Li-norm. 

8 Closeness is measured in terms of the purified distance [461 - 
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2. Specialization to the von Neumann entropy 

For a bipartite quantum state pso> the von Neumann 
entropy of S conditioned on O is defined by H(S\0) p = 
H(pso) — H(po), where H(a) denotes the usual (non- 
conditional) von Neumann entropy of a, i.e., H(a) — 
— Tr(erlog 2 a). The conditional von Neumann entropy is 
always bounded by the smooth min- and max-entropies, 



limf£, n (S|0), 



< H(S\0) p 



(B2) 



< 



]im H^ x (S\0) p 



In particular, if the smooth min- and max-entropies co- 
incide, they are automatically equal to the von Neumann 
entropy. Hence, under this condition, the smooth max- 
entropy occurring in Theorem [T] can be replaced by the 
von Neumann entropy. 



A typical situation where Eq. B2 holds (approxi- 



mately) with equality is that of a large n-partite sys- 
tem with weakly correlated parts. In the limit when the 
correlations disappear, the state of the system is indepen- 
dent and identically distributed (i.i.d.), i.e., of the form 
a® n . Such states are common in information theory and 
physics — they arise, for instance, naturally for systems 
with sufficiently high symmetries (e.g., when a system is 
invariant under permutations of its n parts [47 ). One 
can show that the smooth min- and max-entropies con- 
verge for states of the form ps™o™ = ""so®" [32] ■ Hence, 
by virtue of Eq. |B2[ and using the fact that the von Neu- 
mann entropy is additive, one has, for any e > 0, 

lim -H^(S n \O n U„ 



Information compression uses correlations between two 
systems, S and O, as measured by an entropy measure, 
to create a pure state in a subsystem of S ®0, using only 
local reversible transformations on S. In this result, we 
consider a global system S ® O ® T. In the context of 
our work, S is the system the observer is trying to erase, 
O the memory of the observer, and T is formed by the 
battery, the heat bath and the reference system. 

Theorem 2. Given a system — S <g> O <£> T in a pure 
state, where S is an n-qubit system, it is possible to create 
an t-qubit state of a subsystem of S ® O, with 

l>n~H- iax (S\0) + 2\og 2 (S 2 -12s), 

that is S-close to a pure state, applying a local unitary 
transformation on S . 

The last term is usually small. For instance, for 
5 = 0.003 and e = 10~ 6 , we have 21og 2 (6 2 - 12e) w -20. 
If the system S is large (say « 1000 qubits), this loga- 
rithmic term can be neglected. 

We will see later that the erasure process fails with 
maximum probability 5. This means that allowing a 
probability of failure of 3% has a cost of 10 qubits in 
the size of S\ , and results in an increase of 20kT In 2 in 
the work consumption of the erasure process (see Sec- 
tion 



IIII 



The proof of Theorem [2] consists of two steps: first we 
will decouple a subsystem S% C S, of ^/2-qubits, from T. 
Then we will see that, since the global state is pure, S± is 
purified by a subsystem of S (g> O of the same dimension. 
The pure state created has a total of I qubits. 



lim -H^ n (S n \O n ) ai 

n— too Ti 

H{S\Q) a . 



(B3) 



In other words, for i.i.d. states, the work-entropy rela- 
tion of Theorem [l] asymptotically also holds for the von 
Neumann entropy. 

We note that Eq. |B3| can be seen as a reformulation 
of the Asymptotic Equipartition Property, which plays a 
crucial role in the area of information theory. There, op- 
erational quantities (such as the compression rate of a 
random source or the amount of randomness that can be 
distilled from a given source) are usually related to ei- 
ther the smooth min- or the smooth max-entropy. The 
widespread use of the von Neumann entropy in (text- 
book) information theory is therefore mainly a conse- 
quence of the fact that one typically considers i.i.d. situ- 
ations, such that Eq. |B2| holds with equality. 



Appendix C: Information Compression 

Here we address information compression, used in the 
first step of the erasure process; in particular, we prove 
the bound of Eq. [4j of Section III 



1. Decoupling 

In this first step, we show that it is in general possible 
to identify a subsystem of S that can be decoupled from 
r, according to the following definition. 

Definition 1 (Decoupling). A system, X, is 5'- 
decoupled from another system, Y, if their joint state 
is <5'-close to a product state, 



IAI 



S PXY 



Py) <S 



where S(p, a) 
two states. 



is the trace distance between 



Lemma [T] will show that the size of the decoupled sys- 
tem depends on the correlations between S and O, as 
measured by an entropy measure, the smooth conditional 
max-entropy, H^ lavL (S\0). This result uses the procedure 
of decoupling, first introduced by [3] and generalized by 
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Lemma 1. Given a system 17 = S ® O ®T in a pure 
state, where S is an n-qubit system, it is possible to 6'- 
decouple an m-qubit subsystem of S, Si, from T. The 
maximum size of Si is given by 



m > 



H^ x (S\0) 



log 2 (2<5' - 12e). 



Proof. The decoupling results 33, 48 imply that the av- 
erage distance between the state actually obtained after 
applying a unitary on S and the desired, decoupled state, 
is given by 



Us 



S [ Tr S2 ([Us ® IrMr) , ' 



ii 

(sir) n 



§ P°rj dU s 
6e. (CI) 



Here, the integral is taken over all unitary operations 
on system S, and i?^ in (iS , |r) p o is the smooth conditional 
min-entropy of S, given the information that T may pro- 
vide about that system, before applying Us- Since the 



bound of Eq. CI applies to the average over all unitary 
operators, there is at least one fixed unitary, Us, that 
respects it. For an upper bound of 5' on the distance 
between the desired and the obtained states, we have 



log 2 (2<5' - 12e). 



(C2) 



The global state is pure, so one may use the duality re- 
lation between entropy measures, introduced in Eq. |Bl| of 
Appendix [Bj H^ n (S\T) p o^ = -tff nax (S|O) p0 , where the 
latter is the smooth conditional max-entropy of system 



S given the memory. Inserting this to Eq. C2 we obtain 



n - H^(S\Q) 



log 2 (2<5' - 12e). 



□ 



It can be proved that the bound of Lemma[T]is optimal, 
i.e., that there is no unitary Us that allows us to decouple 
a system with more than m qubits from T [48] . 



2. Purification 

To complete the proof of Theorem[2j it remains to show 
that, given an |-qubit system Si decoupled from T, it is 
possible to find an ^-qubit pure state in a subsystem of 
S (g> O. Note that the global state of S ® O ® T is still 
in a pure state, for we have only applied a local unitary 
transformation on S. 

Lemma 2. Consider a system fl — (Si <g) S2) S3 O (g> T 
in a pure state, such that the m-qubit system Si is 5'- 
decoupled from T, in a fully mixed state. 

It is possible to find an m-qubit subsystem P of S 2 ®0 
that purifies the state of Si such that the joint state of 
Si® P is ^/25-close to a fully entangled state. 



Proof. In a first step we assume that the state of Si and 
r if fully decoupled. We can expand it as 



Pr -2- m 5>)(fc| Sl 



K |«><*i 



We can find systems Ai and A 2 that purify ps 1 and pr- 
The composite system Ai ® A 2 purifies ps x ® pr, 



10) = \<f>')s lAl ® \4>")rA 2 

k 



The statement for 5' = follows now from the fact 
that any two purifications of the same state are related 
by a unitary transformation on the purifying system. In 
particular, P is given as the image of Ai under this uni- 
tary. The claim for strictly positive 5' follows similarly, 
using Uhlmann's theorem and properties of the trace dis- 
tance Lem. 6]. 

□ 



Appendix D: Work extraction 

In this appendix we introduce in detail a process that 
allows us to extract energy from a heat bath and store 
it in a battery, using a pure state of a system X, as 
introduced in Fig. [3] 

Theorem 3. Consider an £-qubit subsystem of S ® O, 
X , with a fully degenerate Hamiltonian, in a pure state. 
Using a heat bath at temperature T and a battery, it is 
possible to extract exactly IkT In 2 work. The system is 
left in a fully mixed state, and the final Hamiltonian is 
the same as the initial one. 

Proof. Let Eq be the energy of the initial state of X, 1 0o) ; 
for a basis {\(f)i)} i ,i = 0,1, . . . , N. We start by lifting the 
energy of all unoccupied states {|0i), . . . , I^jv)} to a high 
value, Ei. This can be done with no energy cost, because 
those states are empty (Fig. [3] a)). 

Now we couple X to the heat bath and let it ther- 
malize; X is taken to a Gibbs state of temperature 
T. The probability that X is in each of the states 
{\<h), — MN)} is given by (n) = [N + e^~ E °^~\ 
where (3 — (kT)^ 1 . In total, the probability that the 
system is in one of the levels raised is N(n) = [l + 

e^- E ^/N]~\ 

We then couple X to the battery and lower the energy 
of levels {|^i), ■ ■ ■ , \4>n)} by a small amount A. Since 
those states were partially occupied, this operation gives 
us a small amount of energy, N(n)A, that is stored in 
the battery (Fig. [3] b)). 

We wait for the system to thermalize again. Because 
levels {|0i), ■ ■ • , \4>n}} have slightly lower energy than be- 
fore, they will become a little more populated, so the ma- 
chine can extract a little more energy when it decreases 
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the energy of the levels by another A. The process is re- 
peated until the energy of states {\<fri}, ■ • ■ , \4>n)} is low- 
ered to E . At this point all {l^i)}^ are degenerate again 
and the state of X is fully mixed (Fig. [3 c). 

In the quasistatic limit of A — > and E\ — > oo, this 
process allows us to extract a total amount of work of 



lim 

Ei->caJ Eo i 

_ ln(7V + 1) 



1 

/3(E-Eq) 



dE 



log|X| fcTln2. 



(Dl) 



□ 



The process described in Theorem [3] takes a system 
from a pure to a fully mixed state, extracting some work 
in the process. By inverting the process (Fig. [I]), one can 
bring a system from a fully mixed to a pure state — in 
other words, erase the system. 

Corollary 3. To erase an t-quhit system initially in a 
fully mixed state, using a heat bath at temperature T , it 
is sufficient to perform work £kT\n2. 

When compressing information between the system 
and the memory, we allowed the state created to be at 
most (5-distant from a pure state (Appendix |C]). The fol- 
lowing lemma shows how that affects the probability of 
failure of the work extraction procedure. 

Lemma 3. // the process described in Theorem^ is ap- 
plied to a state 5 -close to a pure state, it succeeds with 
probability at least 1 — 5 . 

Proof. The probability that two states, p and a, of the 
same system can be distinguished in a one-shot approach 
using a physical process, such as a measurement after 
a reversible evolution, is given by Pr max (p, u) = |[1 + 
5(p, a)], where S(p, a) is the trace distance between those 
states. 

An example of a process to distinguish two states is 
the work extraction process described in Theorem [3] If 
the process is applied to the expected pure state, a, the 
probability of error is zero and the quantity of work ex- 
tracted is £ kT\u2. We denote the probability of failure 
of the work extraction process for an arbitrary state, p, 
by Pp. 

If we are given one of the two states, tr and p, at ran- 
dom, apply the work extraction process and obtain less 
than £ kT\n2, we know that the state was p. This hap- 
pens with probability p p /2. In (1 — p p )/2 of the cases, 
we are given p and extract exactly £ kThi2, and with 
probability 1/2 we had a, extracting the same work, so 
our best guess if we obtain work £ kT In 2 is to say we 
had state a. In total, we will be right with probability 
|[l+Pp]- 

This guessing probability is upper bounded by 
(p,a), so p p < 6(a,p). Since we imposed a max- 
imum distance 5 between the pure state a and p, the 
probability of failure of the process is at most S. 

□ 



Appendix E: Not-so-brief clarification 

The following notes concern the published version of 
this manuscript [42] , amplifying on some points and clar- 
ifying its relation to earlier work on reversible computing, 
Landauer's principle, and Maxwell's demon [21 [4TJI WT\ . 



Non-cyclic erasure 

Our paper deals with the isothermal work (at tempera- 
ture T) required by an observer O to erase a system S, in 
other words restore it to a standard pure state, |0). The 
observer may initially have knowledge about the system: 
O may be classically correlated or even entangled with 
S. We show that the work cost rate required for erasure 
is given by 



'){S\0) = H(S\0) kT\n2, 



(El) 



where H(S\0) is the conditional entropy of S given O. 

If the observer O is classical, the work cost of erasure 
can be zero (when O has complete information on S) or 
positive (when O has partial or no information on S), 
but can never be negative. More generally, however, the 
observer O may hold quantum information (that cannot 
be represented by a classical value) , and the study of this 
more general situation is the main goal of our paper. In 
particular, the initial correlations between S and O may 



be quantum, and H(S\0) can be negative. Eq. El thus 



provides an interpretation of this negative conditional en- 
tropy, namely that it corresponds to a work yield, rather 
than a work cost, associated with performing the erasure. 

Landauer's principle was originally formulated as "the 
cost of erasing an unknown bit is kT In 2" , and is gen- 
erally taken to refer to a more limited situation, where 
there are no correlations between S and O, in other words 
where the observer is entirely ignorant of the system be- 
ing erased. Conversely, a work yield of kT In 2 can be 
obtained by quasi-statically allowing a qubit in an initial 
pure state to randomize itself at temperature T (Fig- 
ure 3). In other words, one can gain work at the cost 
of losing all the initial information about the state of the 
qubit. Landauer's principle and its converse are generally 
seen as straightforward manifestations of the second law 
of thermodynamics, as applied to data-processing sys- 
tems, and they can be applied in a cyclic fashion, e.g. to 
assess the work that Maxwell's demon needs to expend 
to clear its memory at the end of each cycle of operation. 

In the case where the observer O has non-trivial classi- 
cal information about S, Landauer's principle may be re- 
fined to w(S) = H(S) kT In 2, where the entropy H(S) is 
evaluated for the state of S conditioned on the knowledge 
held by O. Note that this is consistent with Eq. |El[ where 
the classical knowledge of O is made explicit, rather than 
taken implicitly in the definition of the (conditional) state 
of S. 

However, in the general case of an observer who may 
hold quantum information about S, the implicit formu- 
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lation of w(S) = H(S) kT\n2 is no longer possible, and 
conditional entropies are necessary to describe the knowl- 
edge of O about S. In this sense, Eq. |E1| can be seen as 
a strict generalization of Landauer's principle to situa- 
tions involving non-classical observers. Let us reconsider 
the example of Quasimodo, who holds a quantum mem- 
ory Q maximally entangled with an n-qubit system S. 
It is one of the central and celebrated features of quan- 
tum mechanics that, in this entangled state, S and Q 
each appear maximally random, while the joint SQ sys- 
tem is in a pure state of zero entropy. In other words, 
we have H(SQ) = 0, H(S) = H(Q) = n, and therefore 
H(S\Q) = —n. Eq. El (with Q taking the place of the 
observer, O) therefore implies that the n-qubit system S 
can be erased with a negative work cost, i.e., a positive 
work yield, of n fcTm2. As before, erasure means that S 
is brought to a standard pure state |0), whereas Q should 
remain unchanged, in the sense that the reduced density 
operator that describes the state of Q should be the same 
before and after the erasure of S. This follows from our 
information-preservation condition, which demands that 
erasure of S must not affect any other information held 
by the observer. 

This erasure process might appear to risk violating the 
second law of thermodynamics, for example by repeat- 
edly allowing S to randomize itself, then extracting work 
as it is erased, in a cyclic fashion. But in fact no such vi- 
olation occurs, because the erasure process uses up, and 
does not replace, the initial entanglement between S and 
Q, thereby preventing the cycle from repeating. 

During the non-cyclic erasure, the joint system SQ 
evolves from a pure initial state to a final state with n 
bits of entropy, an entropy increase that can be harnessed 
to do nkT In 2 of work, violating neither the second law 
nor the original unconditional form of Landauer's prin- 
ciple (which applies to observers having no information, 
classical or quantum, about the system being erased; by 
contrast, our extended Landauer's principle, Eq. |El| cov- 
ers observers with classical or quantum information). 

We can think of entanglement as a thermodynamic re- 
source, a sort of very concentrated fuel: the consumption 
of one unit of entanglement can simultaneously erase a 
qubit and convert kT In 2 of heat into work, two tasks 
that would otherwise require the consumption of one bit 
of classical information each. In other words, quantum 
information can be a thermodynamic resource twice as 
powerful as classical information. 



Erasure in the context of reversible computation 

The published version of this manuscript includes a 
brief note on the application of erasure in algorithms 



to make computation more thermodynamically efficient 
(Figure 1 and Supplementary Information, Section V, 
of [H]). In the following we explain the relation of our 
work to established results on reversible computation, in 
particular [1 \M ETJ \M ISO] • 

Consider a quantum algorithm with input X and out- 
put Y (the algorithm may realize an arbitrary, not nec- 
essarily classical, mapping) . Using extra (initialized) an- 
cilla registers, R, the algorithm can always be imple- 
mented reversibly [2j HQl [41] , corresponding to an isom- 
etry that maps any initial state on X to a joint state on 

Y and R. Eq. |El| now tells us that the ancillas R can in 
principle be erased (i.e., reset to their initial state) at a 
work cost rate of w(R\Y) = H(R\Y) /cTln2. In general, 

Y and R may be entangled and the work cost may be 
negative, so that erasing the ancillas results in a gain of 
work. 

Efficient erasure of the ancillas is well-established in 
theory of computation for the case of deterministic al- 
gorithms (see, e.g., Figure 8 of [H5]). First note that all 
deterministic functions can be made injective (so that X 
is determined by Y) by treating the input as part of the 
output. This makes the algorithm reversible and the en- 
tropy H(R\Y) zero. Hence, there must exist a procedure 
for erasing the ancillas at no energy cost. This erasure 
can be done efficiently as follows. After the execution 
of the algorithm, the classical output Y is copied to a 
separate register. Then the reversible algorithm is run 
backwards, thereby resetting the ancillas R to their ini- 
tial state. Note that this procedure requires the output Y 
to be classical (otherwise the copy operation may affect 
the joint state of Y and R). 

In fact, any probabilistic algorithm for a decision prob- 
lem (or, more generally, the computation of a classical 
function, such as factoring) that receives a classical in- 
put can be boosted to a virtually deterministic one by 
repeated iterations of the algorithm followed by a major- 
ity vote, so that the above considerations apply [4lH 150] . 

However, the described procedures require both the 
input and the output of the algorithm to be classical. 
It would be interesting to apply our results to the more 
general case of algorithms with quantum input or output. 
Examples could be the simulation of a physical system, or 
a tomography-type procedure that takes a finite number 
of copies of a quantum state as input and should output 
an estimation of its density matrix. 

It is perhaps worth noting that our result only implies 
the existence of an erasure procedure with a given work 
cost or gain; we do not show how to implement such a 
procedure, or whether it can be done efficiently. 
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